Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add retireval_top_n to config in env #11132

Merged
merged 3 commits into from
Nov 30, 2024

Conversation

ProseGuys
Copy link
Contributor

@ProseGuys ProseGuys commented Nov 26, 2024

Summary

Solution: Add a new retrieval-related parameter top_n in the environment variables. During the retrieval process, use the default top_k parameter to obtain the top_k most relevant slices. When reranking, use the top_n parameter to return the top_n slices.

Tip

Close issue syntax: Fixes #<issue number> or Resolves #<issue number>, see documentation for more details.
Resolves #11068

Screenshots

Before: After:
... ...

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. 💪 enhancement New feature or request 📚 documentation Improvements or additions to documentation labels Nov 26, 2024
api/.env.example Outdated Show resolved Hide resolved
@ProseGuys ProseGuys requested a review from crazywoola November 27, 2024 01:31
@nadirvishun
Copy link

nadirvishun commented Nov 28, 2024

Currently, in my own project, I am also using similar environment variables to simply distinguish between topK and topN. However, a better approach would be to differentiate them on the knowledge base configuration interface (and also add a threshold parameter), rather than handling it through coding.
image

@ProseGuys
Copy link
Contributor Author

Currently, in my own project, I am also using similar environment variables to simply distinguish between topK and topN. However, a better approach would be to differentiate them on the knowledge base configuration interface (and also add a threshold parameter), rather than handling it through coding.目前,在我自己的项目中,我也在使用类似的环境变量来简单区分topKtopN 。然而,更好的方法是在知识库配置界面上区分它们(并添加threshold参数),而不是通过编码来处理。

Yes, this is only a temporary solution for users with a rag background. Adding relevant parameters on the page side would involve product design, so we have not made any relevant modifications.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 30, 2024
@crazywoola crazywoola merged commit f9c2aa7 into langgenius:main Nov 30, 2024
7 checks passed
@ProseGuys ProseGuys deleted the retrieval_top_n branch December 2, 2024 01:13
@@ -3,6 +3,7 @@

from flask import Flask, current_app

from configs import DifyConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong way to use the config object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I made a stupid mistake and caused a bad impact on your, but the revised submission

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ProseGuys We reverted this change in this version and reopened the related issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 documentation Improvements or additions to documentation 💪 enhancement New feature or request lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The Rerank model in RAG needs to support independent score_threshold and top_k
4 participants